Overlapping Pools for High Throughput Targeted Resequencing
نویسندگان
چکیده
Resequencing genomic DNA from pools of individuals is an effective strategy to detect new variants in targeted regions and compare them between cases and controls. There are numerous ways to assign individuals to the pools on which they are to be sequenced. The naïve, disjoint pooling scheme (many individuals to one pool) in predominant use today offers insight into allele frequencies, but does not offer the identity of an allele carrier. We present a framework for overlapping pool design, where each individual sample is resequenced in several pools (many individuals to many pools). Upon discovering a variant, the set of pools where this variant is observed reveals the identity of its carrier. We formalize the mathematical framework for such pool designs and list the requirements from such designs. We specifically address three practical concerns for pooled resequencing designs: (1) false-positives due to errors introduced during amplification and sequencing; (2) false-negatives due to undersampling particular alleles aggravated by nonuniform coverage; and consequently, (3) ambiguous identification of individual carriers in the presence of errors. We build on theory of error-correcting codes to design pools that overcome these pitfalls. We show that in practical parameters of resequencing studies, our designs guarantee high probability of unambiguous singleton carrier identification while maintaining the features of naïve pools in terms of sensitivity, specificity, and the ability to estimate allele frequencies. We demonstrate the ability of our designs in extracting rare variations using short read data from the 1000 Genomes Pilot 3 project.
منابع مشابه
High-throughput, high-accuracy array-based resequencing.
Although genomewide association studies have successfully identified associations of many common single-nucleotide polymorphisms (SNPs) with common diseases, the SNPs implicated so far account for only a small proportion of the genetic variability of tested diseases. It has been suggested that common diseases may often be caused by rare alleles missed by genomewide association studies. To ident...
متن کاملIdentification of genes for controlling swine adipose deposition by integrating transcriptome, whole-genome resequencing, and quantitative trait loci data
Backfat thickness is strongly associated with meat quality, fattening efficiency, reproductive performance, and immunity in pigs. Fat storage and fatty acid synthesis mainly occur in adipose tissue. Therefore, we used a high-throughput massively parallel sequencing approach to identify transcriptomes in adipose tissue, and whole-genome differences from three full-sibling pairs of pigs with oppo...
متن کاملMAGERI: Computational pipeline for molecular-barcoded targeted resequencing
Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline tha...
متن کاملUnderstanding and utilizing crop genome diversity via high-resolution genotyping.
High-resolution genome analysis technologies provide an unprecedented level of insight into structural diversity across crop genomes. Low-cost discovery of sequence variation has become accessible for all crops since the development of next-generation DNA sequencing technologies, using diverse methods ranging from genome-scale resequencing or skim sequencing, reduced-representation genotyping-b...
متن کاملQuantification of differential gene expression by multiplexed targeted resequencing of cDNA
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by ta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 19 7 شماره
صفحات -
تاریخ انتشار 2009